Inferring Site-Specific Evolutionary Rates: Bayesian Methods are Superior
نویسندگان
چکیده
Not all sites in protein sequences evolve at the same rate during the course of evolution. The working hypothesis assumes that evolutionary conserved sites in a protein sequence point to functionally important regions. Thus, identifying conserved regions in a protein impacts such areas as functional annotation studies, drug design, quaternary structure prediction, and protein biochemistry [1]. Conservation levels are typically inferred from multiple sequence alignment (MSA) of homologous proteins. Recently, ‘Rate4Site’ [2] was developed as a tool for the identification of functional regions in proteins using the maximum likelihood (ML) criteria. However, Bayesian rate inference techniques are also plausible. When considering a Bayesian rate inference a few alternatives exist: (1) the inferred rate is the expectation of the rate over the posterior rate distribution (BAYES-EXP), and (2) the rate yielding the highest posterior probability is chosen (BAYES-MAP). When considering a Bayesian rate inference one must assume a prior distribution of rates. The most commonly used distribution for modeling rate variation across sites is the gamma distribution [3]. This distribution is used here as the prior for both the BAYES-EXP and BAYES-MAP methods. The purpose of this study is to compare three different likelihood-based inference methods: ML, BAYES-MAP and BAYES-EXP by using simulations. We studied the effect of the number of discrete gamma categories on the performance of the BAYES-EXP method. We also studied the effect of various evolutionary parameters, such as the number of taxonomic units, branch lengths and rate distributions on the quality of predictions. Finally, we discuss confidence intervals for inferred rates. We conclude with a biological example.
منابع مشابه
Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior.
The degree to which an amino acid site is free to vary is strongly dependent on its structural and functional importance. An amino acid that plays an essential role is unlikely to change over evolutionary time. Hence, the evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluation of its importance in maintaining the structure/function of ...
متن کاملBayesian approach to inference of population structure
Methods of inferring the population structure, its applications in identifying disease models as well as foresighting the physical and mental situation of human beings have been finding ever-increasing importance. In this article, first, motivation and significance of studying the problem of population structure is explained. In the next section, the applications of inference of p...
متن کاملA Note on Evolutionary Rate Estimation in Bayesian Evolutionary Analysis: Focus on Pathogens
Bayesian evolutionary analysis provide a statistically sound and flexible framework for estimation of evolutionary parameters. In this method, posterior estimates of evolutionary rate (μ) are derived by combining evolutionary information in the data with researcher’s prior knowledge about the true value of μ. Nucleotide sequence samples of fast evolving pathogens that are taken at d...
متن کاملSite-specific evolutionary rates in proteins are better modeled as non-independent and strictly relative
MOTIVATION In a nucleotide or amino acid sequence, not all sites evolve at the same rate, due to differing selective constraints at each site. Currently in computational molecular evolution, models incorporating rate heterogeneity always share two assumptions. First, the rate of evolution at each site is assumed to be independent of every other site. Second, the values of these rates are assume...
متن کاملBayesian Phylogenetic Inference from Animal Mitochondrial Genome Arrangements
The determination of evolutionary relationships is a fundamental problem in evolutionary biology. Genome arrangement data is potentially more informative than DNA sequence data for inferring evolutionary relationships among distantly related taxa. We describe a Bayesian framework for phylogenetic inference from mitochondrial genome arrangement data using Markov chain Monte Carlo methods. We app...
متن کامل